40 research outputs found

    Exploiting Image-trained CNN Architectures for Unconstrained Video Classification

    Full text link
    We conduct an in-depth exploration of different strategies for doing event detection in videos using convolutional neural networks (CNNs) trained for image classification. We study different ways of performing spatial and temporal pooling, feature normalization, choice of CNN layers as well as choice of classifiers. Making judicious choices along these dimensions led to a very significant increase in performance over more naive approaches that have been used till now. We evaluate our approach on the challenging TRECVID MED'14 dataset with two popular CNN architectures pretrained on ImageNet. On this MED'14 dataset, our methods, based entirely on image-trained CNN features, can outperform several state-of-the-art non-CNN models. Our proposed late fusion of CNN- and motion-based features can further increase the mean average precision (mAP) on MED'14 from 34.95% to 38.74%. The fusion approach achieves the state-of-the-art classification performance on the challenging UCF-101 dataset

    The SURE-LET approach to image denoising

    Get PDF
    Denoising is an essential step prior to any higher-level image-processing tasks such as segmentation or object tracking, because the undesirable corruption by noise is inherent to any physical acquisition device. When the measurements are performed by photosensors, one usually distinguish between two main regimes: in the first scenario, the measured intensities are sufficiently high and the noise is assumed to be signal-independent. In the second scenario, only few photons are detected, which leads to a strong signal-dependent degradation. When the noise is considered as signal-independent, it is often modeled as an additive independent (typically Gaussian) random variable, whereas, otherwise, the measurements are commonly assumed to follow independent Poisson laws, whose underlying intensities are the unknown noise-free measures. We first consider the reduction of additive white Gaussian noise (AWGN). Contrary to most existing denoising algorithms, our approach does not require an explicit prior statistical modeling of the unknown data. Our driving principle is the minimization of a purely data-adaptive unbiased estimate of the mean-squared error (MSE) between the processed and the noise-free data. In the AWGN case, such a MSE estimate was first proposed by Stein, and is known as "Stein's unbiased risk estimate" (SURE). We further develop the original SURE theory and propose a general methodology for fast and efficient multidimensional image denoising, which we call the SURE-LET approach. While SURE allows the quantitative monitoring of the denoising quality, the flexibility and the low computational complexity of our approach are ensured by a linear parameterization of the denoising process, expressed as a linear expansion of thresholds (LET).We propose several pointwise, multivariate, and multichannel thresholding functions applied to arbitrary (in particular, redundant) linear transformations of the input data, with a special focus on multiscale signal representations. We then transpose the SURE-LET approach to the estimation of Poisson intensities degraded by AWGN. The signal-dependent specificity of the Poisson statistics leads to the derivation of a new unbiased MSE estimate that we call "Poisson's unbiased risk estimate" (PURE) and requires more adaptive transform-domain thresholding rules. In a general PURE-LET framework, we first devise a fast interscale thresholding method restricted to the use of the (unnormalized) Haar wavelet transform. We then lift this restriction and show how the PURE-LET strategy can be used to design and optimize a wide class of nonlinear processing applied in an arbitrary (in particular, redundant) transform domain. We finally apply some of the proposed denoising algorithms to real multidimensional fluorescence microscopy images. Such in vivo imaging modality often operates under low-illumination conditions and short exposure time; consequently, the random fluctuations of the measured fluorophore radiations are well described by a Poisson process degraded (or not) by AWGN. We validate experimentally this statistical measurement model, and we assess the performance of the PURE-LET algorithms in comparison with some state-of-the-art denoising methods. Our solution turns out to be very competitive both qualitatively and computationally, allowing for a fast and efficient denoising of the huge volumes of data that are nowadays routinely produced in biomedical imaging

    A CURE for noisy magnetic resonance images: Chi-square unbiased risk estimation

    Full text link
    In this article we derive an unbiased expression for the expected mean-squared error associated with continuously differentiable estimators of the noncentrality parameter of a chi-square random variable. We then consider the task of denoising squared-magnitude magnetic resonance image data, which are well modeled as independent noncentral chi-square random variables on two degrees of freedom. We consider two broad classes of linearly parameterized shrinkage estimators that can be optimized using our risk estimate, one in the general context of undecimated filterbank transforms, and another in the specific case of the unnormalized Haar wavelet transform. The resultant algorithms are computationally tractable and improve upon state-of-the-art methods for both simulated and actual magnetic resonance image data.Comment: 30 double-spaced pages, 11 figures; submitted for publicatio

    From Characters to Words: Hierarchical Pre-trained Language Model for Open-vocabulary Language Understanding

    Full text link
    Current state-of-the-art models for natural language understanding require a preprocessing step to convert raw text into discrete tokens. This process known as tokenization relies on a pre-built vocabulary of words or sub-word morphemes. This fixed vocabulary limits the model's robustness to spelling errors and its capacity to adapt to new domains. In this work, we introduce a novel open-vocabulary language model that adopts a hierarchical two-level approach: one at the word level and another at the sequence level. Concretely, we design an intra-word module that uses a shallow Transformer architecture to learn word representations from their characters, and a deep inter-word Transformer module that contextualizes each word representation by attending to the entire word sequence. Our model thus directly operates on character sequences with explicit awareness of word boundaries, but without biased sub-word or word-level vocabulary. Experiments on various downstream tasks show that our method outperforms strong baselines. We also demonstrate that our hierarchical model is robust to textual corruption and domain shift.Comment: Accepted to ACL 2023 Main Conferenc

    User Loss -- A Forced-Choice-Inspired Approach to Train Neural Networks directly by User Interaction

    Full text link
    In this paper, we investigate whether is it possible to train a neural network directly from user inputs. We consider this approach to be highly relevant for applications in which the point of optimality is not well-defined and user-dependent. Our application is medical image denoising which is essential in fluoroscopy imaging. In this field every user, i.e. physician, has a different flavor and image quality needs to be tailored towards each individual. To address this important problem, we propose to construct a loss function derived from a forced-choice experiment. In order to make the learning problem feasible, we operate in the domain of precision learning, i.e., we inspire the network architecture by traditional signal processing methods in order to reduce the number of trainable parameters. The algorithm that was used for this is a Laplacian pyramid with only six trainable parameters. In the experimental results, we demonstrate that two image experts who prefer different filter characteristics between sharpness and de-noising can be created using our approach. Also models trained for a specific user perform best on this users test data. This approach opens the way towards implementation of direct user feedback in deep learning and is applicable for a wide range of application.Comment: Accepted on BVM 2019; Extended ArXiv Version with additional figures and detail

    SURE-LET for Orthonormal Wavelet-Domain Video Denoising

    Full text link

    LMDX: Language Model-based Document Information Extraction and Localization

    Full text link
    Large Language Models (LLM) have revolutionized Natural Language Processing (NLP), improving state-of-the-art on many existing tasks and exhibiting emergent capabilities. However, LLMs have not yet been successfully applied on semi-structured document information extraction, which is at the core of many document processing workflows and consists of extracting key entities from a visually rich document (VRD) given a predefined target schema. The main obstacles to LLM adoption in that task have been the absence of layout encoding within LLMs, critical for a high quality extraction, and the lack of a grounding mechanism ensuring the answer is not hallucinated. In this paper, we introduce Language Model-based Document Information Extraction and Localization (LMDX), a methodology to adapt arbitrary LLMs for document information extraction. LMDX can do extraction of singular, repeated, and hierarchical entities, both with and without training data, while providing grounding guarantees and localizing the entities within the document. In particular, we apply LMDX to the PaLM 2-S LLM and evaluate it on VRDU and CORD benchmarks, setting a new state-of-the-art and showing how LMDX enables the creation of high quality, data-efficient parsers

    A study of CP violation in B-+/- -> DK +/- and B-+/- -> D pi(+/-) decays with D -> (KSK +/-)-K-0 pi(-/+) final states

    Get PDF
    A first study of CP violation in the decay modes B±→[KS0K±π∓]Dh±B^\pm\to [K^0_{\rm S} K^\pm \pi^\mp]_D h^\pm and B±→[KS0K∓π±]Dh±B^\pm\to [K^0_{\rm S} K^\mp \pi^\pm]_D h^\pm, where hh labels a KK or π\pi meson and DD labels a D0D^0 or D‟0\overline{D}^0 meson, is performed. The analysis uses the LHCb data set collected in pppp collisions, corresponding to an integrated luminosity of 3 fb−1^{-1}. The analysis is sensitive to the CP-violating CKM phase Îł\gamma through seven observables: one charge asymmetry in each of the four modes and three ratios of the charge-integrated yields. The results are consistent with measurements of Îł\gamma using other decay modes

    Studies of beauty baryon decays to D0ph− and Λ+ch− final states

    Get PDF

    Study of forward Z + jet production in pp collisions at √s=7 TeV

    Get PDF
    A measurement of the Z(→Ό+Ό−)Z(\rightarrow\mu^+\mu^-)+jet production cross-section in pppp collisions at a centre-of-mass energy s=7\sqrt{s} = 7 TeV is presented. The analysis is based on an integrated luminosity of 1.0 fb−11.0\,\text{fb}^{-1} recorded by the LHCb experiment. Results are shown with two jet transverse momentum thresholds, 10 and 20 GeV, for both the overall cross-section within the fiducial volume, and for six differential cross-section measurements. The fiducial volume requires that both the jet and the muons from the Z boson decay are produced in the forward direction (2.0<η<4.52.0<\eta<4.5). The results show good agreement with theoretical predictions at the second-order expansion in the coupling of the strong interaction.A measurement of the Z(→Ό+Ό−)Z(\rightarrow\mu^+\mu^-)+jet production cross-section in pppp collisions at a centre-of-mass energy s=7\sqrt{s} = 7 TeV is presented. The analysis is based on an integrated luminosity of 1.0 fb−11.0\,\text{fb}^{-1} recorded by the LHCb experiment. Results are shown with two jet transverse momentum thresholds, 10 and 20 GeV, for both the overall cross-section within the fiducial volume, and for six differential cross-section measurements. The fiducial volume requires that both the jet and the muons from the Z boson decay are produced in the forward direction (2.0<η<4.52.0<\eta<4.5). The results show good agreement with theoretical predictions at the second-order expansion in the coupling of the strong interaction
    corecore